Corner detection is an approach used within computer vision systems to extract certain kinds of features and infer the contents of an image. Corner detection is frequently used in motion detection, image registration, video tracking, image mosaicing, panorama stitching, 3D reconstruction and object recognition. Corner detection overlaps with the topic of interest point detection.
An interest point is a point in an image which has a well-defined position and can be robustly detected. This means that an interest point can be a corner but it can also be, for example, an isolated point of local intensity maximum or minimum, line endings, or a point on a curve where the curvature is locally maximal.
In practice, most so-called corner detection methods detect interest points in general, and in fact, the term "corner" and "interest point" are used more or less interchangeably through the literature. As a consequence, if only corners are to be detected it is necessary to do a local analysis of detected interest points to determine which of these are real corners. Examples of edge detection that can be used with post-processing to detect corners are the Kirsch operator and the Frei-Chen masking set.Linda Shapiro and George C. Stockman (2001). Computer Vision, p. 257. Prentice Books, Upper Saddle River. .
"Corner", "interest point" and "feature" are used interchangeably in literature, confusing the issue. Specifically, there are several blob detection that can be referred to as "interest point operators", but which are sometimes erroneously referred to as "corner detectors". Moreover, there exists a notion of ridge detection to capture the presence of elongated objects.
Corner detectors are not usually very robust and often require large redundancies introduced to prevent the effect of individual errors from dominating the recognition task.
One determination of the quality of a corner detector is its ability to detect the same corner in multiple similar images, under conditions of different lighting, translation, rotation and other transforms.
A simple approach to corner detection in images is using correlation, but this gets very computationally expensive and suboptimal. An alternative approach used frequently is based on a method proposed by Harris and Stephens (below), which in turn is an improvement of a method by Moravec.
If the pixel is in a region of uniform intensity, then the nearby patches will look similar. If the pixel is on an edge, then nearby patches in a direction perpendicular to the edge will look quite different, but nearby patches in a direction parallel to the edge will result in only a small change. If the pixel is on a feature with variation in all directions, then none of the nearby patches will look similar.
The corner strength is defined as the smallest SSD between the patch and its neighbours (horizontal, vertical and on the two diagonals). The reason is that if this number is high, then the variation along all shifts is either equal to it or larger than it, so capturing that all nearby patches look different.
If the corner strength number is computed for all locations, that it is locally maximal for one location indicates that a feature of interest is present in it.
As pointed out by Moravec, one of the main problems with this operator is that it is not isotropic: if an edge is present that is not in the direction of the neighbours (horizontal, vertical, or diagonal), then the smallest SSD will be large and the edge will be incorrectly chosen as an interest point.Obstacle Avoidance and Navigation in the Real World by a Seeing Robot Rover, Hans Moravec, March 1980, Computer Science Department, Stanford University (Ph.D. thesis).
Without loss of generality, we will assume a grayscale 2-dimensional image is used. Let this image be given by . Consider taking an image patch over the area and shifting it by . The weighted sum of squared differences (SSD) between these two patches, denoted , is given by can be approximated by a Taylor series. Let and be the partial image derivative of , such that
This produces the approximation which can be written in matrix form: where A is the structure tensor,
In words, we find the covariance of the partial derivative of the image intensity with respect to the and axes.
Angle brackets denote averaging (i.e. summation over ), and denotes the type of window that slides over the image. If a Box blur is used, the response will be Anisotropy, but if a Gaussian is used, then the response will be isotropic.
A corner (or in general an interest point) is characterized by a large variation of in all directions of the vector . By analyzing the eigenvalues of , this characterization can be expressed in the following way: should have two "large" eigenvalues for an interest point. Based on the magnitudes of the eigenvalues, the following inferences can be made based on this argument:
Harris and Stephens note that exact computation of the eigenvalues is computationally expensive, since it requires the computation of a square root, and instead suggest the function where is a tunable sensitivity parameter.
Therefore, the algorithm does not have to actually compute the eigenvalue decomposition of the matrix and instead it is sufficient to evaluate the determinant and trace of to find corners, or rather interest points in general.
The Shi–Tomasi corner detector directly computes because under certain assumptions, the corners are more stable for tracking. Note that this method is also sometimes referred to as the Kanade–Tomasi corner detector.
The value of has to be determined empirically, and in the literature values in the range 0.04–0.15 have been reported as feasible.
One can avoid setting the parameter by using Noble's corner measure which amounts to the harmonic mean of the eigenvalues: where is a small positive constant.
If can be interpreted as the precision matrix for the corner position, the covariance matrix for the corner position is , i.e.
The sum of the eigenvalues of , which in that case can be interpreted as a generalized variance (or a "total uncertainty") of the corner position, is related to Noble's corner measure as
The equation of a tangent line at pixel is given by:
where is the gradient vector of the image at .
The point closest to all the tangent lines in the window is:
The distance from to the tangent lines is weighted by the gradient magnitude, thus giving more importance to tangents passing through pixels with strong gradients.
Solving for :
are defined as:
Minimizing this equation can be done by differentiating with respect to and setting it equal to 0:
Note that is the structure tensor. For the equation to have a solution, must be invertible, which implies that must be full rank (rank 2). Thus, the solution
only exists where an actual corner exists in the window .
A methodology for performing automatic scale selection for this corner localization method has been presented by Lindeberg by minimizing the normalized residual
over scales. Thereby, the method has the ability to automatically adapt the scale levels for computing the image gradients to the noise level in the image data, by choosing coarser scale levels for noisy image data and finer scale levels for near ideal corner-like structures.
Notes:
With denoting the original image intensity, let denote the scale space representation of obtained by convolution with a Gaussian kernel
In practice, this multi-scale corner detector is often complemented by a scale selection step, where the scale-normalized Laplacian operator
These detectors are more completely described in blob detection. The scale-normalized Laplacian of the Gaussian and difference-of-Gaussian features (Lindeberg 1994, 1998; Lowe 2004)
The scale-normalized determinant of the Hessian operator (Lindeberg 1994, 1998)
The scale selection properties, affine transformation properties and experimental properties of these and other scale-space interest point detectors are analyzed in detail in (Lindeberg 2013, 2015).
The unsigned Hessian feature strength measure responds to local extrema by positive values and is not sensitive to saddle points, whereas the signed Hessian feature strength measure does additionally respond to saddle points by negative values. The unsigned Hessian feature strength measure is insensitive to the local polarity of the signal, whereas the signed Hessian feature strength measure responds to the local polarity of the signal by the sign of its output.
In Lindeberg (2015) these four differential entities were combined with local scale selection based on either scale-space extrema detection
By experiments on image matching under scaling transformations on a poster dataset with 12 posters with multi-view matching over scaling transformations up to a scaling factor of 6 and viewing direction variations up to a slant angle of 45 degrees with local image descriptors defined from reformulations of the pure image descriptors in the SIFT and SURF operators to image measurements in terms of Gaussian derivative operators (Gauss-SIFT and Gauss-SURF) instead of original SIFT as defined from an image pyramid or original SURF as defined from Haar wavelets, it was shown that scale-space interest point detection based on the unsigned Hessian feature strength measure allowed for the best performance and better performance than scale-space interest points obtained from the determinant of the Hessian . Both the unsigned Hessian feature strength measure , the signed Hessian feature strength measure and the determinant of the Hessian allowed for better performance than the Laplacian of the Gaussian . When combined with scale linking and complementary thresholding on , the signed Hessian feature strength measure did additionally allow for better performance than the Laplacian of the Gaussian .
Furthermore, it was shown that all these differential scale-space interest point detectors defined from the Hessian matrix allow for the detection of a larger number of interest points and better matching performance compared to the Harris and Shi-and-Tomasi operators defined from the structure tensor (second-moment matrix).
A theoretical analysis of the scale selection properties of these four Hessian feature strength measures and other differential entities for detecting scale-space interest points, including the Laplacian of the Gaussian and the determinant of the Hessian, is given in Lindeberg (2013) and an analysis of their affine transformation properties as well as experimental properties in Lindeberg (2015).
where is the unit vector perpendicular to the gradient, and determines how edge-phobic the detector is. The authors also note that smoothing (Gaussian is suggested) is required to reduce noise.
Smoothing also causes displacement of corners, so the authors derive an expression for the displacement of a 90 degree corner, and apply this as a correction factor to the detected corners.
For feature detection, SUSAN places a circular mask over the pixel to be tested (the nucleus). The region of the mask is , and a pixel in this mask is represented by . The nucleus is at . Every pixel is compared to the nucleus using the comparison function:
where is the brightness difference threshold, is the brightness of the pixel and the power of the exponent has been determined empirically. This function has the appearance of a smoothed top-hat or rectangular function. The area of the SUSAN is given by:
If is the rectangular function, then is the number of pixels in the mask which are within of the nucleus. The response of the SUSAN operator is given by:
where is named the 'geometric threshold'. In other words, the SUSAN operator only has a positive score if the area is small enough. The smallest SUSAN locally can be found using non-maximal suppression, and this is the complete SUSAN operator.
The value determines how similar points have to be to the nucleus before they are considered to be part of the univalue segment. The value of determines the minimum size of the univalue segment. If is large enough, then this becomes an Edge detection.
For corner detection, two further steps are used. Firstly, the centroid of the SUSAN is found. A proper corner will have the centroid far from the nucleus. The second step insists that all points on the line from the nucleus through the centroid out to the edge of the mask are in the SUSAN.
The response function is defined as:
This will be large when there is no direction in which the centre pixel is similar to two nearby pixels along a diameter. is a discretised circle (a Bresenham circle), so interpolation is used for intermediate diameters to give a more isotropic response. Since any computation gives an upper bound on the , the horizontal and vertical directions are checked first to see if it is worth proceeding with the complete computation of .
The first corner detection algorithm based on the AST is FAST (features from accelerated segment test). Although can in principle take any value, FAST uses only a value of 3 (corresponding to a circle of 16 pixels circumference), and tests show that the best results are achieved with being 9. This value of is the lowest one at which edges are not detected. The order in which pixels are tested is determined by the ID3 algorithm from a training set of images. Confusingly, the name of the detector is somewhat similar to the name of the paper describing Trajkovic and Hedley's detector.
Then, for a suitable choice of ,
spatio-temporal interest points are detected from spatio-temporal extrema of the following spatio-temporal Harris measure:
The determinant of the Hessian operator has been extended to joint space-time by Willems et al and Lindeberg, leading to the following scale-normalized differential expression:
In the work by Willems et al, a simpler expression corresponding to and was used. In Lindeberg, it was shown that and implies better scale selection properties in the sense that the selected scale levels obtained from a spatio-temporal Gaussian blob with spatial extent and temporal extent will perfectly match the spatial extent and the temporal duration of the blob, with scale selection performed by detecting spatio-temporal scale-space extrema of the differential expression.
The Laplacian operator has been extended to spatio-temporal video data by Lindeberg, leading to the following two spatio-temporal operators, which also constitute models of receptive fields of non-lagged vs. lagged neurons in the LGN:
For the first operator, scale selection properties call for using and , if we want this operator to assume its maximum value over spatio-temporal scales at a spatio-temporal scale level reflecting the spatial extent and the temporal duration of an onset Gaussian blob. For the second operator, scale selection properties call for using and , if we want this operator to assume its maximum value over spatio-temporal scales at a spatio-temporal scale level reflecting the spatial extent and the temporal duration of a blinking Gaussian blob.
Colour extensions of spatio-temporal interest point detectors have been investigated by Everts et al.
Scale-space interest points based on the Lindeberg Hessian feature strength measures
Lindeberg (2013, 2015) proposed to define four feature strength measures from the Hessian matrix in related ways as the Harris and Shi-and-Tomasi operators are defined from the structure tensor (second-moment matrix).
Specifically, he defined the following unsigned and signed Hessian feature strength measures:
\begin{cases}
t^2 \, (\det H L - k \, \operatorname{trace}^2 H L)
& \mbox{if} \, \det H L - k \, \operatorname{trace}^2 H L > 0 \\
0 & \mbox{otherwise}
\end{cases}
\begin{cases}
t^2 \, (\det H L - k \, \operatorname{trace}^2 H L)
& \mbox{if} \, \det H L - k \, \operatorname{trace}^2 H L > 0 \\
t^2 \, (\det H L + k \, \operatorname{trace}^2 H L)
& \mbox{if} \, \det H L + k \, \operatorname{trace}^2 H L < 0 \\
0 & \mbox{otherwise}
\end{cases}
\begin{cases}
t \, \lambda_1(H L)
& \mbox{if} \, |\lambda_1(H L)| < |\lambda_2(H L)| \\
t \, \lambda_2(H L)
& \mbox{if} \, |\lambda_2(H L)| < |\lambda_1(H L)| \\
t \, (\lambda_1(H L) + \lambda_2(H L))/2 & \mbox{otherwise}
\end{cases}
where and
denote the trace and the determinant of the Hessian matrix of the scale-space representation at any scale ,
whereas
denote the eigenvalues of the Hessian matrix.
or scale linking. Furthermore, the signed and unsigned Hessian feature strength measures and were combined with complementary thresholding on .
Affine-adapted interest point operators
The Wang and Brady corner detection algorithm
C = \left(\frac{\delta^2 I}{\delta \mathbf{t}^2}\right)^2 - c|\nabla I|^2,
The SUSAN corner detector
c(\vec{m}) = e^{-\left(\frac{I(\vec{m}) - I(\vec{m}_0)}{t}\right)^6}
n(M) = \sum_{\vec{m}\in M} c(\vec{m})
R(M) = \begin{cases}
g - n(M) & \mbox{if}\ n(M) < g\\
0 & \mbox{otherwise,}
\end{cases}
The Trajkovic and Hedley corner detector
r(\vec{c}) = \min_{\vec{p} \in P} \left(\left(I(\vec{p}) - I(\vec{c})\right)^2 + \left(I(\vec{p}') - I(\vec{c})\right) ^2\right)
AST-based feature detectors
Automatic synthesis of detectors
Spatio-temporal interest point detectors
A = \sum_u \sum_v \sum_w h(u,v, w)
\begin{bmatrix}
L_x(u,v,w)^2 & L_x(u,v,w) L_y(u,v,w) & L_x(u,v,w) L_t(u,v,w) \\
L_x(u,v,w) L_y(u,v,w) & L_y(u,v,w)^2 & L_y(u,v,w) L_t(u,v,w) \\
L_x(u,v,w) L_t(u,v,w) & L_y(u,v,w) L_t(u,v,w) & L_t(u,v,w)^2 \\
\end{bmatrix}
=
\begin{bmatrix}
\langle L_x^2 \rangle & \langle L_x L_y \rangle & \langle L_x L_t \rangle\\
\langle L_x L_y \rangle & \langle L_y^2 \rangle & \langle L_y L_t \rangle\\
\langle L_x L_t \rangle & \langle L_y L_t \rangle & \langle L_t^2 \rangle\\
\end{bmatrix}
H = \det(\mu) - \kappa \, \operatorname{trace}^2(\mu).
\det(H_{(x,y,t),\mathrm{norm}} L)
= \, s^{2 \gamma_s} \tau^{\gamma_{\tau}}
\left( L_{xx} L_{yy} L_{tt} + 2 L_{xy} L_{xt} L_{yt}
- L_{xx} L_{yt}^2 - L_{yy} L_{xt}^2 - L_{tt} L_{xy}^2 \right).
\partial_{t,\mathrm{norm}} (\nabla_{(x,y),\mathrm{norm}}^2 L) = s^{\gamma_s} \tau^{\gamma_{\tau}/2} (L_{xxt} + L_{yyt}),
\partial_{tt,\mathrm{norm}} (\nabla_{(x,y),\mathrm{norm}}^2 L) = s^{\gamma_s} \tau^{\gamma_{\tau}} (L_{xxtt} + L_{yytt}).
Bibliography
Reference implementations
See also
External links
|
|